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Abstract 

A classic problem in physics is the origin of fat tailed distributions generated by complex systems. 
We study the distributions of stock returns measured over different time lags r. We find that 
destroying all correlations without changing the r = 1 d distribution, by shuffling the order of the 
daily returns, causes the fat tails almost to vanish for r > 1 d. We argue that the fat tails are 
caused by known long-range volatility correlations. Indeed, destroying only sign correlations, by 
shuffling the order of only the signs (but not the absolute values) of the daily returns, allows the 
fat tails to persist for r > 1 d. 
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Over the last few decades, remarkable progress has been made in quantitatively describing 
non-Gaussian phenomena, including those observed in economic [|I| and social systems, 
that are typically characterized by the presence of fat tailed Levy distributions 0. The 
behavior of financial markets has recently ^ |, ^, |^, |^, |^, |TD|, |TT], |T^, [T^ become a 
focus of interest to physicists as well as an area of active research because of its rich and 
complex dynamics [0, |T3|, [T§, 0, |g, [T|, ^ 0, 0, ^ |2|, ^ |2§. One open 



question relates to the probability distribution underlying returns on stock markets. It is 
well known that the century-old Gaussian model |^| underestimates the probability of large 
events. Indeed, the distribution of stock returns is fat tailed [||, |13|, On the one hand, 
the fat tails could be due to an underlying Levy distribution. According to the generalized 
central limit theorem, the sum of r independent (i.e., uncorrelated) Levy distributed random 
variables is also Levy distributed, such that the persistence of fat tails for large r is due 
solely to the Levy nature of the original (r = 1) distribution. Furthermore, if the r = 1 
distribution is Levy with exponential truncation of the tails, then we still expect a certain 
degree of persistence of the fat tails [^. On the other hand, fat tails can also persist ||10[ for 
r > 1 due to long-range correlations in a "hidden variable" such as volatility (i.e., locally 
measured standard deviation) . Moreover, such long-range correlations have been found to 
produce fat tails . Although stock returns lack long-range power law correlations, yet the 
absolute values of the returns are known to be long-range correlated |^, 0, ^ 0, |55 



The absolute returns are power law correlated with non-unique scaling exponents |^, IT 
Here we test the hypothesis (see ref. [jlO|) that long-range volatility correlations are the 
origin of the fat tails. 

Below we will show that for stock market returns the observed persistence of fat tails for 
large r cannot be explained without long-range correlations in the volatility. Shuffling the 
daily returns has the effect of destroying all correlations while maintaining unchanged the 
r = 1 d distribution. For shuffled data, we will show that the distribution is fat tailed for 
r = 1 d but not for r > 1 d. We interpret this finding as evidence that volatility correlations 
rather than the r = 1 d Levy-like distribution are responsible for the existence of fat tails 
for large r. Indeed, we will also show that shuffling only the signs of the returns allows the 
fat tails to persist for r > 1 d and the distribution does not converge to a Gaussian. These 
findings conclusively prove that known long-range volatility correlations (rather than known 
short-range |12[ sign correlations) are responsible for fat tails for any r > 1 d. We will also 
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show that, remarkably, the short-range (1-2 d) sign correlations also play an important role 
in the distribution properties of small price changes, and we will propose an explanation for 
why a Gaussian fits the data so well for small, but not large, returns. 

Our dataset consists of the base 10 logarithms of daily returns obtained from 59 stock 
market indices (obtained from yahoo.com: AEX, AORD, ATG, ATX, BFX, BSESN, BVL30, 
BVSP, CCSI, DJA, DJI, DJT, DJU, DOT, FCHI, FTSE, HEX, HSI, IBC, IGRA, IIX, IPSA, 
IXIC, JKSE, KFX, KLSE, KSll, KSE, MERV, MID, MTMS, MXX, N225, NDX, NTOT, 
NYA, NZ40, OEX, PSE, PSI, PX50, RUA, RUI, RUT, SAX, SETI, SML, SMSI, SOOX, 
SPG, SSEG, SSMI, STI, TAlOO, TSE, TWII, VLIG, XMI, XUlOO). The returns r{t) are 
defined in terms of the prices P(t) by 

We normalize the returns to unit variance for each market index separately. To be able to 
compare returns measured over differing time scales, we also define a rescaled return r^- by 

rAt) = ^ E r(t') , (2) 

with r measured in days and ri = r. The daily and rescaled returns play roles similar to 
those played by "bare" and "dressed" quantities in field theory. Note that for uncorrelated 
(independent) and unitary Gaussian distributed returns, their variances will be identical due 
to the central limit theorem: cr(ri) = (j[rr) = 1 d. Similarly, if ri(t) are Levy distributed, 
then rr{t) will also be Levy distributed. 

Even for Gaussian returns, however, the presence of correlations can lead to anomalous 
behavior, such that ri and Vr may have non-identical probability distributions. We therefore 
develop a method to "subtract" the effects of correlations. For each of the 59 time series, we 
generate a modified control time series by shuffling the order of the daily returns (Fig. p. 
This shuffled daily returns model (SDRM) will have a probability distribution identical to 
the real data for r = 1 d, but lacks all correlations. Hence, for r > 1 d the real data and 
the SDRM will in general not have identical distributions (Fig. ^ unless correlations are 
lacking. Thus, we now have a way to test the hypothesis that the fat tails in p(r,-) persist 
solely due to correlations. If the probability density distribution p{rr) is fat tailed for the 
real data but not for the SDRM, then the conclusion would be that the fat tails in p(rT-) are 
due to correlations. 
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Shuffling the data destroys all kinds of correlations — indeed, the data become independent 
numbers. Specifically, shuffling destroys the known long-range power law correlations in the 
volatility of the returns as well as the short-range correlations in the signs of the returns. 
We thus develop a method for "subtracting" only the correlations in the signs of the returns 
ri{t) while preserving (volatility) correlations in the absolute returns (Fig. |I]). For 

each of the 59 time series, we generate a second modified control time series by shuffling the 
order of the signs — but not of the absolute values — of the daily returns. This shuffled signs 
return model (SSRM) will have a symmetrized probability distribution identical to the real 
data and to the SDRM for r = 1 d, but not necessarily for r > 1 d. 

We study the symmetrized probability density distribution function p(rT-) of the returns 
r^- from 59 stock markets and compare them to those of the SDRM and the SSRM. We 
focus on the fat tailed regions of the distributions by studying a properly defined modified 
characteristic function 



fir) = j drr pivr) exp [-(|r^| 



ro)^] 



(l/iV)^exp[-(|r,| 



ro)'] 



(3) 



where t is time in days. In order to study the fat tailed region while retaining good statistics, 
we chose a value tq = 5 corresponding to 5 standard deviations. (We also studied higher 
moments, but these are extremely sensitive to large events, rendering the results not sta- 
tistically significant.) Similarly, to study the central bell curve region, we define a second 
function 

gij) = / drr pivr) cxp (-r^) ~ (1/A^) ^exp (-r^) , (4) 

t 

where = 1. In practice, we calculate these functions directly from the returns, rather 
than through the distributions, to obtain better statistics. 

We find that the fat tails almost disappear for r > 1 d in the SDRM, showing that 
correlations are necessary for maintaining the fat tails for r > 1. This finding is consistent 



with the results reported in ref. [|T0] and rules out the possibility that the daily Levy-like 
distribution is responsible for the persistence of fat tails. In fact, if this were so, the fat tails 
would persist for r > 1 d even after shuffling the order of the returns ri{t), contrary to our 
findings. One must conclude that the fat tails are mainly due to correlations. Note, however, 
that for the SDRM, the fat tails do not disappear entirely and p(r^) never becomes truly 



4 



Gaussian even for r — >100 d (business, not calendar, days), so a truncated Levy distribution 



of ri{t) is in principle not ruled out for r = 1 d |T3|, |28|. Also not ruled out is the distribution 
suggested in ref. 

Our most important finding is that the fat tails remain intact for the SSRM, showing that 
the fat tails can persist for r > 1 d when the data lack sign correlations but have long-range 
correlated absolute values. This finding proves that whatever the choice of the distribution 
p{ri) of daily returns, long-range correlations in the volatility are necessary to explain the 
behavior of p{rr)- 

An important consequence of these findings is that great care must be taken when trying 
to study the distributions p{rr) independently of the correlations. Our findings show that 
this is true for r > 1 d, and it is possible that a study of higher frequency data, with many 
daily data points, would show similar behavior for r <1 d. The lower renormalization cutoff 
could conceivably be as small as the resolution of the data set, even as small as 10 s for a 
high volume American stock. 

We also find that the central bell curve region of the distribution of returns is more 
similar to that of the SDRM than to the SSRM for r > 1, showing that in this region the 
real data are more similar to a Gaussian and that Markovian sign correlations in the returns 
are important in maintaining the Gaussian-like appearance. Finally, we also find that the 
behavior of the distribution p(rT-) is remarkably similar for different r. 

The new results reported here are of broad interest and scientifically important because 
long-range correlations and fat tailed distributions can be found in many physical, chemical, 
and biological phenomena. Moreover, the prices of many financial derivative products de- 
pend only on the distribution of returns. The existence of long-range volatility correlations 
and fat tailed distributions underlying financial time series has been known for some time. 
What was not fully understood is the origin of the fat tails — which turns out to persist for 
large lags mainly because of long-range correlations in the volatility. Note that exponentially 
decaying (i.e., not power law) correlations cannot lead on their own to fat tails at large lags. 



A systematic study of the S&P 500 index by Gopikrishnan et al. |T0| had found that the 
observed scaling of the distributions is due to time dependencies. Here, we have shown that 
shuffling only the signs, but not the absolute values, of the returns allows the fat tails to 
persist — hence the fat tails are due to volatility correlations rather than any other kind of 
time dependency. 
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We comment on the finding that the real data are more similar to the SDRM in the central 
bell curve region, but more similar to the SSRM in the fat tailed region. Fig. |^ shows that 
the absence of volatility correlations causes a collapse of the fat tails in the SDRM, but that 
the absence of sign correlations causes large deviations in the central bell curve region in the 
SSRM, for r > 1. The real data appear qualitatively somewhere in between the SDRM and 
the SSRM. The implication is that the error of neglecting sign correlations can somehow 
"compensate" the error of ignoring long-range volatility correlations for small price changes. 
As a result, for r > 1 d the central region is deceptively well described by a Gaussian. 
The two errors cancel each other out, hence the success of Bachelier's century-old Gaussian 
theory of stock returns. The Gaussian assumption, known to be only approximate, was 
nevertheless the basis of the Economics Nobel Prize in 1997 (awarded for the work that led 
to the Black-Scholes Options Pricing Theory). The main flaw in the Gaussian theory is 
that it cannot explain the fat tails that point to relatively rare but large events, such as the 
great correction of 1987. There are other known phenomena in which the cancellation of 
two errors has led to surprisingly good models, a classic example being the original theory 
of polymer melts by Flory |3^, in which the error in estimating repulsive and attractive 
energies could effectively "cancel" each other out, such that the model became better than 
otherwise would be expected. 

In summary, our findings indicate that the fat tailed distributions of stock returns are 
mainly due to volatility correlations. More generally, we have shown that fat tailed distri- 
butions can arise from long-range correlations in the absolute value of any time series. We 
note that the shuffling techniques we have developed here are general and can be applied to 
the study of time series generated by other dynamically rich complex systems that present 
similar challenges. 

We thank the Brazilian agencies CNPq (for support) and CAPES (for funding the visit 
to Brazil of MS) and P. Gopikrishnan, P. Ch. Ivanov and H. Eugene Stanley for very helpful 
comments. 
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business time [d] 

FIG. 1: S&P 500 index, shown on a base 10 logarithmic scale, offset to zero. Also shown are SDRM 
which has completely uncorrelated returns but an identical r = 1 d distribution, and SSRM, which 
has an identical probability distribution of the absolute daily returns, but lacks sign correlations. 
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FIG. 2: Symmetrized probability density distribution p(r) of the returns measured over periods 
r = 10 d for 59 stock indices. Also shown are the SSRM and SDRM for r = 10 d. These 
distributions are typical of r > Id. Wc find there is a fat tail in SSRM but not in SDRM, 
indicating that the origin of the fat tails lies in known long-range correlations in the absolute 
returns. Inset follows a linear (not semilog) scale. The distribution of r\ has been normalized to 
unit variance. For r = 1 d all three distributions are identical (not shown). 
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FIG. 3: (a) Mean characteristic function /(r) for 59 stock indices, along with the SSRM and SDRM 
controls, focusing on the fat tails. The values for the SDRM become significantly lower with r, 
indicating that the tails are less fat for the shuffled data. The values for the SSRM, however, are 
remarkably consistent with the original data, showing that known long-range volatility correlations 
are the real cause of the observed non-Gaussian fat tailed distributions. The loss of the fat tail for 
the SDRM thus rules out a true Levy stable distribution, (b) Mean characteristic function ^(r) for 
the same datasets. Note that the SSRM has many more returns near zero for r > 1 d, leading to 
a higher value of g. This result shows that sign correlations in the real data play an important role 
that counteract the volatility correlations. Another result seen in (a) and (b) is that an equivariant 
Gaussian approximation is extremely good for small r (as seen from A.g/g ~ 20%), but very bad 
for large r (since A./ / f 2± 1000%), a finding potentially important for options pricing theory. 
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